Introduction

Baseball is the most data-driven sport in the world today and has a rich recorded history of statistics. Sabermetrics to aid managerial decision-making has exploded in recent years and the methods used to evaluate players have become increasingly complex, making it a fascinating game to explore. The goal of any franchise’s front office should be to minimize performance volatility and maximize wins. While baseball far transcends dollar signs and analytics for fans, it is nonetheless a business that must be treated as such to effectively generate winning seasons. Among other things, our report sheds light on responsible benchmarks for player salaries, the richest recruiting epicenters for MLB scouts, and some of the stats most critical for a campaign that will endure through October. The data used to support our analysis is drawn from the Sean Lahman online database, a repository featuring an array of .csv files with records across two centuries.

Summary paragaph

We will be analyzing 2017’s league MVP’s in this summary. In the plots below, we examine how important colleges and salaries are important for a team’s success. Here, we ask how important they are to a player’s performance.

Last year American League MVP was Jose Altuve and he made $3687500 in 2016. In comparison, National League MVP, Giancarlo Stanton, made $9000000 in 2016. Surprisingly, none of them went to college in U.S. (as indicted by NA value while analyzing the college players). We will be analyzing more of similar concepts in the final deliverable.

World Series Winners Since 2000

A leading objective of our analysis is to identify the most impactful statistics in the game and determine performance threshholds that can predict certain degrees of success, both on an individual and team level. Our insights mimic the research scouts and front office employees use to make major budgeting decisions across the league, and the following table is an excellent example of precedence influencing this management. Below is a list of World Series champions since 2000 and their corresponding hitting, pitching, and fielding statistics.

Table 1: World Series Winners Since 2000

Hitting
Pitching & Fielding
Year Team W L RS RA DIFF BA OBP SLG OPS ERA WHIP SOA FP
2000 New York Yankees 87 74 871 814 +57 0.277 0.354 0.450 0.804 4.76 1.43 1040 0.981
2001 Arizona Diamondbacks 92 70 818 677 +141 0.267 0.341 0.442 0.783 3.87 1.24 1297 0.986
2002 Los Angeles Angels of Anaheim 99 63 851 644 +207 0.282 0.341 0.433 0.774 3.69 1.28 999 0.986
2003 Florida Marlins 91 71 751 692 +59 0.266 0.333 0.421 0.754 4.04 1.35 1132 0.987
2004 Boston Red Sox 98 64 949 768 +181 0.282 0.360 0.472 0.832 4.18 1.29 1132 0.981
2005 Chicago White Sox 99 63 741 645 +96 0.262 0.322 0.425 0.747 3.61 1.25 1040 0.985
2006 St. Louis Cardinals 83 78 781 762 +19 0.269 0.337 0.431 0.768 4.54 1.38 970 0.984
2007 Boston Red Sox 96 66 867 657 +210 0.279 0.362 0.444 0.806 3.87 1.27 1149 0.986
2008 Philadelphia Phillies 92 70 799 680 +119 0.255 0.332 0.438 0.770 3.88 1.36 1081 0.985
2009 New York Yankees 103 59 915 753 +162 0.283 0.362 0.478 0.840 4.26 1.35 1260 0.985
2010 San Francisco Giants 92 70 697 583 +114 0.257 0.321 0.408 0.729 3.36 1.27 1331 0.988
2011 St. Louis Cardinals 90 72 762 692 +70 0.273 0.341 0.425 0.766 3.74 1.31 1098 0.982
2012 San Francisco Giants 94 68 718 649 +69 0.269 0.327 0.397 0.724 3.68 1.27 1237 0.981
2013 Boston Red Sox 97 65 853 656 +197 0.277 0.349 0.446 0.795 3.79 1.30 1294 0.987
2014 San Francisco Giants 88 74 665 614 +51 0.255 0.311 0.388 0.699 3.50 1.17 1211 0.984
2015 Kansas City Royals 95 67 724 641 +83 0.269 0.322 0.412 0.734 3.73 1.28 1160 0.985
2016 Chicago Cubs 103 58 808 556 +252 0.256 0.343 0.429 0.772 3.15 1.11 1441 0.983
2017 Houston Astros 101 61 896 700 +196 0.282 0.346 0.478 0.824 4.12 1.27 1593 0.983
2018 Boston Red Sox 108 54 876 647 +229 0.268 0.339 0.453 0.792 3.75 1.25 1558 0.987

American All-Stars by Cities

In this section, we will be analyzing all-star players since 2000 and the cities that they went to college for. The purpose of the chart below is to help the recruiters to scout talents in more “productive” cities.

Both size and color of the dots encode the same parameter: the total number of all-stars produced. Regions such as SEC and ACC seem to produce more talents compared to other regions. California is also a notable candidate in that it has produced the most number of all-stars state-wise.

Fig. 1: Visualization of All-Star Talents by Cities

This plot is important in our project since scouting is a big part of a team’s future success. By knowing where to scout, the budget can be better spent for maximum efficiency.

In the final product, we are hoping to make the year selectable, add more options such as batting averages, home runs, etc., and modify the map to give the states-summary.

Top 100 players(in salary) to batting average

Fig. 2: Top 100 Highest-paid Players’ Salaries and Their Batting Average.

This scatter plot is to describe the relationship between players’ salary and their batting average(this is hit divided by At bat). Note that only top 100 highest-paid players in MLB from 2016 season are shown.

From this chart, we can see that Daniel Murphy was one of the most underrated players in MLB with an annual salary of $8 million. His BA was about 0.35 in 2016 season.

Player Salary Distribution

Fig. 3: Visualization of distribution of baseball player salaries in Major League Baseball for the year 2016.

As with any sport, baseball has some top players that are valued higher than others. It is interesting to see how this is visualized the distribution of player salaries. This plot is heavily grouped in the bottom, visualizing that most players made less than $10 million that year in salary. On the top end of the spectrum, the highest earners made upwards of $30 million in salary during 2016. There is clearly a wide range of player salaries in the year 2016, a few players made significantly more in salary than the majority of other players.